Link to Shiny App

36-315 Final Project

Group 5
Hannah Worrall, Audris Wong, Kevin Yang, Tony Zhang, Todd Zolynas

1 Introduction

Using data about Houston, Texas provided by the 2011 American Community Survey (ACS) as a supplement to the 2000 and 2010 US Census we visualize population differences that occurred over a span of 10 years, with a focus on race and age groups, on both the tract and block level of scope. In addition, the supplemental ACS data provided also allowed analysis of incomes across age and race groups.
As tracts and block groups are designed to accurately represent the populations of areas on a macro scale, we have that the number of tracts and block groups between years is not constant. For example, in 2000 there were 889 tract and 2710 block group entries associated with Houston, whereas in 2010, there were 1066 tract and 3001 block group entries. This represents different cutoffs and boundaries being redefined by each new census. While certain boundaries have changed, and certain blocks and tracts have been added or removed, the available data is still useful in visualizing trends across the 10 years.

2 Plots

2.1 Tract Level - General Population Choropleth

Our first map shows the total population of each tract in Houston in 2000 and 2010. Blue areas correspond to areas of low population while red areas correspond to areas of high population. The first noteworthy fact about the graphs is that the maximum population of a tract changed dramatically between 2000 and 2010. In 2000 the most populated tract had 18,550 people. In 2010 the most populated tract had 33,201 people. This is representative of the fact that the population of Houston increased between 2000 and 2010. In general, the upper limit for each quartile of tract population is higher in 2010 than 2000. Overall, the general population of Houston increased from 4.7 million in 2000 to 5.9 million in 2010.
From the maps it is also possible to see movement of the population from the center of the city to the outlying tracts. In 2000 the highest-populated tracts were scattered throughout the city, while in 2010 they tend to be on the outside of the city. We chose to split the population into quartiles instead of using a gradient because population is a highly skewed variable. Using a gradient would result in most of the tracts being similarly colored. By binning the population, we can see more of the differences in population. To show the change in population between 2000 and 2010, we could have done a change in population map. However, since the tracts were not the same in 2000 and 2010 this would have been difficult. We could have also used histograms, violin plots, or density plots to show the population of each tract in 2000 and 2010, but since we were interested in the geographic distribution of the population in addition to the general change in population these options were suboptimal.

2.2 Tract Level - White Population Proportion Choropleth

Our second map shows the proportion of white people per tract in Houston in 2000 and 2010. In both years white people are the majority racial group in most tracts not in the city center. They also make up the majority of several tracts in the western part of the center of the city (these tracts were high-income tracts). There do not appear to be any significant changes in the proportion of white people over time geographically. We chose to use a gradient in our map because proportion of white people was not a highly skewed variable, and thus a gradient captures a great deal of the variation across tracts. In order to see any changes between 2000 and 2010 we could have used a map which showed the change in percent white people, but as the tracts changed between 2000 and 2010 this would have been difficult. We could also have used histograms, boxplots, violin plots, or density plots to describe the distribution of percent white people of tracts, but since we were interested in the geographic distribution of white people this would have lost a very important part of the data.

2.3 Tract Level - African American Population Proportion Choropleth

We see that in 2000, Houston’s African-American population was located primarily in the northeastern and southeastern districts of the city center. In 2010, the geographical distribution of the african american population is very similar to 2000. However we can see small changes such as an increased presence in the suburbs of northern Houston and suburbs in general. One interesting tract is located in the northwest block that is primarily white within which a very small tract of primarily of primarily African americans exists, however there is no obvious indication as to why when researched using Google Maps. The small differences in population proportions of african americans could be more effectively visualized using a population change plot for 2010 instead of second population proportion choropleth plot. Contour plots could have been used to track changes in population of african americans however it would be difficult to judge what the proportion was for each tract.

2.4 Tract Level - Asian Population Proportion Choropleth

In 2000 we see that very few asians exists outside of central/southwest Houston. Very few outlying rural areas had any significant proportions of asians. Unlike african americans, asians in 2000 appear to be located outside of the city center and tend to reside in the suburbs. When comparing this to the 2010 choropleth we see that this pattern is relatively stationary. However we see an overall increase in the population proportion of asians on a tract level in the areas described. We see slight increases in the outlying areas however the difference is negligible. There no anomalies when analyzing the population distributions of the population of asians in Houston. We could have used contour plots to show the changes in population density, however contour plots of tract data is cluttered and difficult to read in addition to the possibility of choosing bandwidths that could result in misleading results.

2.5 Tract Level - Hispanic/Native American/Pacific/Others Population Proportion Choropleth

We were interested in examining whether or not there were any changes in the population distribution of self identified Hispanics, Native Americans, Pacific Americans, or other ethnicities between the years 2000 and 2010. Combining Hispanics with all other minorities helps us cover all populations present within the Houston population as well as providing a general view of changes in geographical distributions of minorities.
By examining our side by side choropleth plots, we can see that in 2000, the highest concentrations of Hispanics/ others are in the inner city, while the suburban areas around the city have much lower concentrations of Hispanics/others. Out of the suburban areas, suburbs to the west and south of the city center have higher population proportions of Hispanics/others, while the suburbs to the east and north of the city center have lower population proportions of Hispanics/others. When we look at the 2010 plot, we see that while the high population proportions of Hispanics/others are relatively unchanged in the inner city, there is an increase in population proportion for these ethnicities into the suburban areas north, west, and east of the city center. This follows the findings from plot 2.1, the general population choropleth, which shows a similar increase in the total population proportion in suburban areas. Using a choropleth with a gradient for population proportion allows us to display more information on a tract level more effectively than a contour plot or scatter plot can for this type of data.

2.6 Block Group Level - Average Age Choropleth

We use a choropleth to display the average ages of block groups from 2011 as it makes it easy to see whether there are specific areas in Houston that are of a specific average age. However, by using a choropleth of average age rather than a 3D surface plot with an adjustable bandwidth makes it more difficult to observer overarching geographical age trends.
By examination of our plot, we see that the city center is populated with citizens of all ages. As we move out of the city center, we see that the suburban areas close the city center are still mostly young (between 20 and 34). Further from the city center, we see much higher average ages. Specifically in the northernmost and southernmost suburbs, there are the most block groups of Houston’s oldest citizens (between 50 and 70). From our plot, we also see that interestingly enough, there are random block groups of abnormally young citizens, with the average age between 0 and 24 (most noticeably in the beach area to the southeast of the city center). We concluded that these block groups were subject to self-reporting error, as the census data was only meant to be collected from individuals 18 and over.

2.7 Block Group Level - Average Income Choropleth

We used a blockgroup-level map of the Houston area to visualize the relationship between income level and location colored by 6 income sextiles. Since income distribution is heavily skewed, it is more helpful to split incomes into buckets than to shade block groups relative to the maximum income. An alternative to our map-income combination would have been to visualize the income data alone via a single variable boxplot or violin plot, since the income data is continuous. Our choice to visualize the income data through the map allows us to identify specific regions associated with income effects, but unfortunately masks features such as the number of measurements per income-sextile. Plotting the income data through a box or violin plot would have presented these numerical summaries succinctly at the cost of losing the income-location associations.
We observe that there is a very dense L-shaped region within Houston-proper that is associated with the highest income sextile, with large swaths of block groups surrounding Houston also having high incomes. If we cross reference this income-blockgroup maps with the various ethnicity-blockgroup maps we constructed, we observe in general that block groups with higher levels of non-whites appear to be associated with lower income block groups. Also, the large areas of black (the lowest sextile) along the coast and southeastern end of the state appear to be influenced by only a few low-income measurements in each region upon further inspection of the data.

2.8 Tract Level - Live Alone Population Proportion Choropleth

A choropleth map of 1-person households was used to visualize the relationship between location and people who live alone, each blockgroup colored by its quantity relative to 2000. The coloring choice was made because there is a large difference between the number of 1-person households between 2000 and 2010, and leaving each plot with its default coloring scheme would have obscured comparisons between the two. We could have used a variety of alternatives to visualize the population of people who live alone, such as bar plots (number of people living alone in 2000 vs. 2010) and density plots of the number of lonely households plotted against longitude and latitude. Our choice, the map, allows us to identify specifically which tracts are associated with higher populations of people who live alone. However we have no perception of the actual quantity of 1-person households. A bar plot would have allowed us to visualize the numeric change in number of 1-person households but in turn, we would have lost the ability to see where these measurements came from in Houston.
We observe that in general, there are not very many 1-person households in Houston. In 2000, there are a few flecks of non-blue color (high concentrations of 1-person households) in the western side of downtown Houston with various shades of blue throughout the rest of the region. The area where we observed the most 1-person households also appears to be the area of downtown Houston with the highest income block groups. We hypothesize that these block groups could be filled with young workers with high-paying white collar jobs such as in financial services or law. In 2010, we observe a stark contrast in the number of 1-person households overall. The map is almost entirely dark blue save a few lighter block groups, which means that relative to 2000 there are vastly fewer 1-person households in nearly every blockgroup including the downtown area where the most 1-person households were observed. We have no information to formulate explanations of the cause of this change, but one possible explanation is that rent in the Houston area increased and thus caused many people to seek roommates. It is also possible that a proportion of the 1-person households acquired partners during the 10 years, but that does not explain why no new single person households showed up in that timespan.

2.9 Block Group Level - Violin Plot of Race versus Income

We used a violin plot to visualize a basic density association between income and each of the four race groups (Black, White, Asian, Hispanic). Since the Income data was given by each block group instead of race, the plot created does not give a definitive “Income by Race” visualization. Given our data set, we found the race percentages of each block group for each of the four races - for example, one block could have 13% Asians, 50% White, and so forth. With the race percentages listed in four new columns for our data, we then found which blocks signified the top 15% quantiles for each race; in other words, which 15% of neighborhoods had the most Whites/Blacks/Hispanics/Asians? From these top 15% neighborhoods of each race, we then plotted the Income, found by averaging average male income and average female income. Therefore, some neighborhoods could be double-counted in our plot, although this is not a source of error. For example, a neighborhood that has the most Black citizens might also have the most Asian citizens, and so the income for that neighborhood would be counted toward both races.
From the plot, we see that the associated income level for a neighborhood increases as the neighborhood is found to have a greater proportion of Hispanics, Blacks, Asians, and then Whites. We see Hispanics have an approximate median income of $20000, while Blacks have a median income of about $25000. Asian neighborhoods have an approximate median income of $40000, and White neighborhoods have an approximate median income of $45000. Notable outliers can be seen in the White Income group, with blocks associated with Incomes over $200000. For all race groups, we see the densities increase toward values until slightly below the median, and then the densities decrease as the associated income value increases. A bandwidth of 0.8 was chosen, as it showed a general balance to help us visualize density trends within each race’s associated income without over-smoothing to hide these trends, nor under-smoothing and presenting a distracting level of noise in the data.
Alternative plots could have included bean plots, box plots, or any other sort of plot that helps visualize a continuous variable across multiple categories. Violin plots were chosen because they give the necessary 5-number summary provided by a basic box plot, in addition to a density visualization that is adjustable by bandwidth.

2.10 Block Group Level - Violin Plot of Age versus Income

Similar to Race versus Income, we used violin plots to demonstrate the relationship between age and income. In comparison to the former plot, this one required slightly less manipulation, and is therefore more interpretable - we did not have to look at quantile cutoffs for certain groups; instead, each block is already given an associated average age, so the age versus income plots did not require additional modification. We used age cutoffs chosen due to their general approximation with stages in professional career: ages 12-18 represent childhood, 18-23 represent college/student age, 23-35 represent young professional age, 35-55 represent adult or family age, and 55+ was used as the cutoff to approximate late professional or retired age.
The immediate trend that one can see in this plot is that as a block group’s average age increases, so does its associated income level. We see ages 12-23 affiliated with a median income of about $20000, ages 23-35 associated with a median income of about $25000, 35-55 associated with a median of about $35000, and 55+ with a median of $40000. It is worth noting that the outliers affiliated with Incomes over $200000 are found in the 35-55 age group, instead of 55+, indicating that the neighborhoods currently making the most money are affiliated with the approximate adult-professional age, as opposed to late-professional or retired age. However, the median and IQR range is overall still greater in the 55+ age group than it is in the 35-55 age group, which lends credibility to the claim that the values about $200000 are outliers.
In addition, it is worth discussing the presence of a 12-18 age group in our block data. These entries were found in the data set, but the interpretation is open to debate. Perhaps there actually are blocks of land where the average age is under 18 - maybe these blocks are populated by families with many children, or orphanages, and so forth. Alternatively, some of these block group ages could be the result of data entry error.
Bandwidth values were chosen to be 0.8 to best visualize the densities in the violin plots. Higher and lower values failed to provide a proper balance of density approximation given by a bandwidth of 0.8. Over-smoothing and under-smoothing were undesirable, as the former would mask the density trends, and the latter would provide too much visual noise to be easily interpretable.
Alternative plots could have included bean plots, box plots, or any other sort of plot that helps visualize a continuous variable across multiple categories. Violin plots were chosen because they give the necessary 5-number summary provided by a basic box plot, in addition to a density visualization that is adjustable by bandwidth.

3 Conclusion

We believe choropleths typically provide the best picture when analyzing changes in populations over time. One important improvement that could have been made to all choropleth plots could have been to exclude the shoreline, Trinity Bay, and Galveston Bay areas out of the 2010 data as no data was collected for these tracts/block groups in 2000 and thus no comparison can be made. Removing or hiding this data would make the differences between these areas over time less stark and misleading. Additionally, some choropleth plots such as 2010 African American Population Proportion and Hispanic/Other 2010 Population Proportion could have been replaced with population difference choropleths in order to more effectively display the geographical changes in of population distributions.
After thorough analysis of the provided data we come to the following conclusions: